Methods and apparatus for determining if a search query should be issued

ABSTRACT

Methods and apparatus assessing, ranking, organizing, and presenting search results associated with a user&#39;s current work context are disclosed. The system disclosed assesses, ranks, organizes and presents search results against a user&#39;s current work context by comparing statistical and heuristic models of the search results to a statistical and heuristic model of the user&#39;s current work context. In this manner, search results are assessed, ranked, organized, and/or presented with the benefit of attributes of the user&#39;s current work context that are predictive of relevance, such as words in a user&#39;s document (e.g., web page or word processing document) that may not have been included in the search query. In addition, search results from multiple search engines are combined into an organization scheme that best reflects the user&#39;s current task. As a result, lists of search results from different search engines can be more usefully presented to the user.

PRIORITY CLAIM

This application claims the benefit of U.S. Provisional PatentApplication No. 60/764,041, filed Jan. 31, 2006 entitled “Methods andApparatus For Ranking, Organizing, and Presenting Search Results.”

TECHNICAL FIELD

The present disclosure relates in general to searching computerizedinformation repositories, and, in particular, to methods and apparatusfor assessing, ranking, organizing, and presenting search resultsassociated with a user's current work context.

BACKGROUND

Many people use a variety of different computer-based informationsources such as search engines (e.g., Google™, MSN®, Yahoo!®, etc.) tofind information they are seeking. Typically, users are looking forinformation relevant to a work task in which they are currently engaged.For example, a user may be interested in information related to a topicalready displayed be a web browser, or a user may be interested ininformation related to a word processing document they are currentlyworking on (e.g., a word processing document). Typically, the userenters a query into an input box, and the search engine examines dataassociated with thousands of documents. The search engine then sends theuser list of search results. In an effort to help users find relevantinformation quickly, most information sources rank search results forpresentation to the user, thereby reducing the user's need to wadethrough a long list of search results. For example, documents that asearch engine determines to be most relevant to the user's query aretypically placed first in a list of search results.

Typically, search engines use some form of term frequency—inversedocument frequency (TF/IDF) ranking algorithm or some similar method todetermine this presentation order or other organization scheme. TF/IDFscores documents in direct proportion to the number of query termspresent in the document and in inverse proportion to some function ofthe number of times the query terms appear in the information repositoryas a whole. In other words, documents with many occurrences of rarequery terms are ranked highly. In addition, other factors may be used torank the documents, such as the number of times other documentsreference that document. Search engines might also display the documentsretrieved based on data associated with the retrieved documents. Forexample, documents labeled with the same subject area might be presentedin the same folder.

One problem with this method of ranking, organizing and presentingretrieved documents when seeking information related to a user's currentwork context is that the query terms alone are used to assess therelevance of the search results in the course of retrieval. However,most search engines place limitations on the length of the query and/orlimitations on other aspects of the manner in which the search may bespecified (e.g., the types of constraints that may be specified ondesired results). For example, a search engine may limit the number ofterms in a query to five, or the search engine may not contain a methodfor specifying a date range. In general, however, the user's currentcontext is typically too complex to be represented in such a compressedand simplified form. For example, if the document the user is currentlyworking on—an important aspect of the user's context—has more than fiverelevant terms, but a search engine only accepts queries that are fivewords long, the query alone is not necessarily the best representationof the user's current work context with which to assess relevance, sincethe user's current document (e.g., web page or word processing document)contains information beneficial to assessing the relevance of a searchresult that is not easily communicated to the search engine in the formof a query. Other properties of the user's current work context, forexample, their task (e.g., drafting a legal document), stage in thattask, their role in an organization (e.g., lawyer), the nature of thatorganization (e.g., a law firm), specified areas of interest (e.g.,patents), the application in which they are working (e.g., a wordprocessor), the document genre or type (e.g., legal brief, or resume) ortheir past behavior, might also be important aspects of assessing therelevance of a given search result. Therefore, assessing, ranking,organizing, and presenting search results associated with the user'scontext simply using a query acceptable to a given search engine may notproduce the best results.

Moreover, as described above, the user's current document by itselftypically does not constitute the entire user context in terms of whichrelevance of information should be assessed. Other factors, including,but not limited to, the user's task, the state of that task, theorganization for which the work is being performed, the user's role inthat organization, explicit user indications, the application in whichthe user is working on the document, the document genre, etc., may alsoimportant in determining a ranking, organization, and presentation ofsearch results that truly reflects the user's information needs.

Consider, for example, the task of writing a scientific research paper.Presentations to others may be given before the work is more broadlypublished. Therefore, at the beginning of the writing task, it may beuseful to assemble information by the author that very closely matchesthe first drafts of the paper, so that those prior writings may bereused. Later in the process, when the author is assembling relatedwork, it may be desirable to relax those constraints so as to provide abroader, more complete set of search results. In this example, the stageand type of task influence the character of the search results desired.However it may not be possible to specify this directly to a typicalsearch engine.

In addition, the best strategy for presenting information should bedetermined. For example, while composing an electronic mail message,prior messages sent to and/or received from the recipients of thecurrent message may be retrieved. These messages may be presented nextto the email editor window organized in headers labeled by the name ofthe recipient. Messages in each header may also be organized in a rankedlist, where items on the top of the list are ordered from most to leastsimilar to the contents of the body of the message being composed. Thesystem may also draw icons next to each email recipient indicating thepresence of the additional information. When the user moves his/hermouse over those icons, the system may present the best matching email,so as to give the user a preview of the available information. Incontrast, while shopping online and viewing a product, information mightbe displayed in a window next to the user's web browser, organized incategories. Reviews of that product may be organized in one category,accessories in another category, and prices under yet another category.An improved search system should be able to determine how to presentinformation to the user using a strategy that works better for the workcontext in which the user is currently engaged.

Another problem with relying solely on the rankings or organizationschemes provided by search engines themselves occurs when queryingmultiple information sources. Different information sources typically donot use the same scoring algorithm in determining what to return andwhat order to return it in or in determining how to organize and presentthese results. As a result, ranking and/or organizing scores associatedwith results from different search engines (if returned to the requesterof the search at all) typically cannot reliably be used to combinemultiple result lists into combined results lists. This is typicallyacceptable only if information from different information sources ispresented under different headings (e.g., one heading for eachinformation source). If, however, headings are defined functionally orby content rather than just by information source, then a commonassessment, ranking, organization, and presentation system may be neededin order to determine which results would be most useful to the user,which results should be presented to the user, and how the resultsshould be organized and presented to the user (e.g., in what order).Similarly, if a unified view of information from a variety ofinformation sources is desired, a common assessment, ranking,organization, and presentation system may be needed.

SUMMARY

The system described herein solves these problems by automaticallygenerating search queries based on the user's current work context. Forexample, a user's work context may include different aspects such astext associated with a website or a word processing document as well asa task associated with the user such as the task of “budgeting.” Theuser's current work context may include the document the user iscurrently working on (e.g., a web page or a word processing document) aswell as other variables as described herein. The system disclosed hereinthen automatically searches, assesses, ranks, organizes, and presentsthe search results based on a rich model of the user's current workcontext instead of simply relying on the user entered search queries andthe search engine's assessments, rankings, etc., because the searchengine assessments, rankings, etc., are based on the much more limitedsearch query provided to the information source. In certain embodiments,the system described herein accomplishes this by comparing statisticaland heuristic models of the search results to a statistical andheuristic model of the user's current work context, including thedocument currently being manipulated by the user. As described in detailbelow, this is an improvement over existing search engines (e.g.,Google™, MSN®, Yahoo!®, etc.).

The first problem is solved because search queries are automaticallygenerated each time the user's current context changes (and/orperiodically), and the limitations each search engine places on thequery or results format and expressiveness, are not also limitations onalgorithms that may be used to assess, rank, organize and present searchresults. For example, such algorithms may represent the user's currentwork context using more than five terms, or using features of the user'swork context other than just terms of the sort usable in search queries.For example, the search results may be ranked with the benefit of otherwords in the user's current document that may not have been included inthe search query. For example, a search engine query may be limited tothe terms “dog” and “cat,” but a particular search result and the user'scurrent document may also contain the word “mouse,” making one searchresult potentially more relevant than another search result thatcontains the words “dog” and “cat” but does not contain the word“mouse.” Other features, such as the task the user is currentlyperforming in a desktop application, may be used to inform the rankingand presentation of search results. For example, if the user is viewinga contact in a personal information management application such asMicrosoft Outlook®, home pages for the contact person might be rankedmore highly than other retrieved documents, and could be presented in aseparate folder in the list of search results retrieved.

The second problem is solved because search results from multiple searchengines can be analyzed and organized together by the same algorithm,based on the same information about the user's current work context. Forexample, bold face words in a current word processing document may begiven additional raking weight, and search results from different searchengines can be usefully compared with each other in terms of potentialrelevance to the user's current work context, and so, for example,meaningfully combined into a single ranked list or other unifiedpresentation scheme, which may itself be determined by the user'scurrent work context.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a high level block diagram of an example communicationssystem.

FIG. 2 is a more detailed block diagram showing one example of a clientdevice.

FIG. 3 is a more detailed block diagram showing one example of a contextbased search system.

FIG. 4 is a message diagram showing an example communications exchangebetween a client device, a context based search system, and a pluralityof information sources.

FIG. 5 is a flowchart of an example process for obtaining and rankingsearch results.

FIG. 6 is a screen shot showing an example user document and an examplesearch results side bar with ranked search results.

FIG. 7 is a screen shot showing an example search results web page fromone information source.

FIG. 8 is a screen shot showing an example search results web page fromanother information source.

FIG. 9 is a screen shot showing an example search results side bar inaccordance with an embodiment of the present system.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

The present system is most readily realized in a network communicationssystem. A high level block diagram of an exemplary networkcommunications system 100 is illustrated in FIG. 1. The illustratedsystem 100 includes one or more client devices 102, one or more routers106, a plurality of different information sources 108 including databaseservers 110 and/or databases 112, and one or more context based searchsystems 114. Each of these devices may communicate with each other via aconnection to one or more communications channels 116 such as theInternet and/or some other data network, including, but not limited to,any suitable wide area network or local area network. It will beappreciated that any of the devices described in the examples herein maybe directly connected to each other instead of over a network. Inaddition, any combination of devices described in the examples hereinmay be embodied in a single device.

The information sources 108 store a plurality of files, programs, and/orweb pages in one or more databases 112 for use by the client devices102. For example, a database server 110 may be associated with apublicly available search engine such as Google™, MSN®, or Yahoo!®. Inaddition, a database server 110 may include commercial databases such asLexis® and Westlaw®. Still further, a database server 110 may be a localdatabase server such as a corporate intranet server. The databases 112may be connected directly to the database servers 110 and/or via one ormore network connections.

Data from the information sources 108, which is relevant to content indocuments displayed on the client devices 102, is sent to the clientdevices 102 via the communications channel 116. For example, a user of aclient device 102 may be viewing a web page related to an automobile,and the client device 102 may receive a list of hyperlinks to other webpages related to that automobile. In one embodiment, the informationsources 108 communicate directly with each client device 102. In otherembodiments, the information sources 108 communicate with the clientdevices 102 via a search system 114.

One information source 108 and/or one search system 114 may interactwith a large number of other devices. Accordingly, each informationsource 108 and/or search system 114 is typically a high end computerwith a large storage capacity, one or more fast microprocessors, and oneor more high speed network connections. Conversely, relative to atypical server 110 (or in some embodiments system 114), each clientdevice 102 typically includes less storage capacity, a singlemicroprocessor, and a single network connection.

A more detailed block diagram of the electrical systems of an exampleclient device 102 is illustrated in FIG. 2. Although the electricalsystems of different client devices 102 may be similar, the structuraldifferences between these devices are well known. For example, a typicalhandheld client device 102 is small and lightweight compared to atypical personal computer 102.

The example client device 102 includes a main unit 202 which preferablyincludes one or more processors 204 electrically coupled by anaddress/data bus 206 to one or more memory devices 208, other computercircuitry 210, and one or more interface circuits 212. The processor 204may be any suitable processor, such as a microprocessor from the INTELPENTIUM® family of microprocessors. The memory 208 preferably includesvolatile memory and non-volatile memory. Preferably, the memory 208stores a software program that interacts with the other devices in thesystem 100 as described below. This program may be executed by theprocessor 204 in any suitable manner.

In this example, the memory 208 includes a context generation module224, a query generation module 226, a result modeling module 228, anassessment, organization, and ranking module 230, and a search resultdisplay module 232. The context generation module 224 examines documents(e.g., web pages, e-mails, word processing documents, slidepresentations, spread sheets, etc.) and other variables (e.g., user taskand task state, application type, document genre, user role, etc.) tocreate context models as described below. The query generation module226 forms multiple information queries targeted to multiple informationsources 108 as described in detail below. The result modeling module 228examines search results (e.g., summaries, web pages, documents, etc.) tocreate search result models as described in detail below. Theassessment, and organization, and ranking module 230 compares searchresults models to original context models to assess, rank, and organizesearch results from single or, more usually, multiple informationsources 108 as described in detail below. The search result displaymodule 232 displays ranked and organized search results received fromthe search system 114 to the user (e.g., in a sidebar to the associateddocument) as described below. The memory 208 may also store otherinformation such as digital data indicative of documents, files,programs, web pages, etc. retrieved from another computing device and/orloaded via an input device 214.

The interface circuit 212 may be implemented using any suitableinterface standard, such as an Ethernet interface and/or a UniversalSerial Bus (USB) interface. One or more input devices 214 may beconnected to the interface circuit 212 for entering data and commandsinto the main unit 202. For example, the input device 214 may be akeyboard, mouse, touch screen, track pad, track ball, isopoint, and/or avoice recognition system.

One or more displays, printers, speakers, and/or other output devices216 may also be connected to the main unit 202 via the interface circuit212. The display 216 may be a cathode ray tube (CRTs), liquid crystaldisplays (LCDs), a plasma device, or any other type of display. Thedisplay 216 generates visual displays of data generated during operationof the client device 102. For example, the display 216 may be used todisplay search results received from the search system 114 includingdata from multiple information sources 108. The visual displays mayinclude prompts for human input, run time statistics, calculated values,data, etc.

One or more storage devices 218 may also be connected to the main unit202 via the interface circuit 212. For example, a hard drive, CD drive,DVD drive, a flash device, and/or other storage devices may be connectedto the main unit 202. The storage devices 218 may store any suitabletype of data. The client device 102 may also exchange data with othernetwork devices 220 via a wireless transceiver 222 and/or a connectionto the network 116. The network connection may be any suitable type ofnetwork connection, such as an Ethernet connection, digital subscriberline (DSL), telephone line, coaxial cable, etc.

In some embodiments, a context based search system 114 is used. A moredetailed block diagram of a context based search system 114 isillustrated in FIG. 3. A main unit 302 in the search system 114preferably includes a processor 304 electrically coupled by anaddress/data bus 306 to a memory device 308 and a network interfacecircuit 310. The network interface circuit 310 may be implemented usingany suitable data transceiver, such as an Ethernet transceiver. Theprocessor 304 may be any suitable type of well known processor, and thememory device 308 preferably includes volatile memory and non-volatilememory. Preferably, the memory device 308 stores a software program thatimplements all or part of the method described below.

In particular, the memory preferably stores a query generation module312, a result modeling module 314, and an assessment, ranking, andorganization module 316. The query generation module 312 forms multiplesearch queries targeted to multiple information sources 108 as describedin detail below. The result modeling module 314 examines search results(e.g., summaries, web pages, documents, etc.) to create search resultmodels as described in detail below. The assessment, ranking, andorganization module 316 compares search results models to originalcontext models to rank and organize search results from multipleinformation sources 108 as described in detail below. These softwaremodules 312, 314, 316 may be executed by the processor 304 in a wellknown manner. However, some of the steps described in the method belowmay be performed manually and/or without the use of the search system114. The memory device 308 and/or a separate database 318 also storefiles, programs, web pages, etc. for use by other servers 110 and/orclient devices 102.

Users of the system 100 may be required to register with the searchsystem 114. In such an instance, each user may choose a user identifier(e.g., e-mail address) and a password which may be required for theactivation of services. The user identifier and password may be passedacross the network 116 using encryption built into the user's webbrowser. Alternatively, the user identifier and/or password may beassigned by the search system 114.

A message diagram showing an example communications exchange between aclient device 102 and a plurality of information sources 108 isillustrated in FIG. 4. In this example, the communications exchange isinitiated by a client device 102 displaying a document to a user (block402). For example, the client device 102 may be displaying a web page,an e-mail message, a word processing document, a slide presentation, amap, and/or any other suitable document.

Each time the user context on the client device 102 changes, forexample, if the content of the document displayed by the client device102 changes, the client device 102 may automatically generate a contextmodel message 404. For example, when the user stops typing into a wordprocessing document (e.g., no activity for more than five seconds), theclient device 102 may generate a context model message 404 representingthe current state of the user's context, including the current wordprocessing document, as discussed earlier. Alternatively, or inaddition, the client device 102 may generate the context model message404 in response to other events. For example, the client device 102 maygenerate the context model message 404 periodically and/or when thefocus of the document changes. In other embodiments, the user mayinitiate this sequence themselves, e.g., by pressing a button.

The context model message 404 includes a context model. The contextmodel is a representation of a user's current context based on theuser's current document and/or other factors such as the applicationtype associated with the document, the genre of the document (e.g.,legal brief, patent application, resume, etc.), the user's task and thestate of that task, explicit indication by the user (e.g., pressing abutton, or highlighting some words), the organization in which the useris currently working, and/or the user's role in that organization, etc.Preferably, the context model is generated by the context generationmodule 224 of the associated client device 102. The context model is astatistical and heuristic model of the user's context. For example, auser context including a user text document including occurrences of thewords dog, cat, mouse, and book might be described in part by a contextmodel like “dog:10; cat:6; mouse:3; book:1” where the numbers representweights associated with the words. In this example, the context modelindicates that the associated document is more about dogs than it isabout cats. The weights may be assigned by any suitable algorithm. Forexample, the weighting algorithm may take into account the number ofoccurrences of each word, the location of each word (e.g., in the titleof the document versus in the body), the style of the words (e.g., boldtext versus plain text), etc. A detailed discussion of various methodsof determining a context model is included in U.S. Patent Publication2005/0027704, the entirety of which is incorporated herein by reference.

The context model would in many cases also include representations ofsuch factors as the user's current task and task state, the applicationin which the user is currently working, the document type or genre, theorganization in which the user is working, the user's role in thisorganization, explicit user indications, etc.

In some cases, aspects of the user context may be directly availablefrom the task a user is accessing. For example, aspects of the usercontext may come from explicit user indication, such as selecting a taskfrom a menu, or using a certain document template (letter, resume, etc.)or other features of the user's application. Other aspects of a task maybe directly available through application programming interfaces or byobserving communication between the application and other software, suchas the operating system, or hardware, such as a network device. In othercases, aspects of the user context may be based on, computed from, orderived from one or more of these directly available aspects. Forexample, text in the document a user is reading or writing may bedirectly available through an application programming interface, andthen that text could be further processed to classify the user'sdocument into one or more categories (e.g., legal brief, letter,outline, science paper), based on words that are present and/or absentin the document. This classification could then become one aspect of thecontext model. Other aspects of the user context may also be inferredfrom directly observable aspects of the user context other than the textof a document the user may be accessing. For instance, the stage in atask may be inferred from a step in a business process managementsystem, or the status of an account as represented by a customerrelationship management system.

It is preferable for the context model to contain all of the informationuseful for retrieving relevant documents from search engines,determining the relevance of those retrieved documents, and furtherdetermining how they are relevant. Moreover, the portion of the contextmodel derived from the document itself can in turn be the result of ananalysis process that is itself sensitive to all these sorts offeatures. For example, suppose the text of the user's document isanalyzed in order to classify the document into one or more categoriesbased on words that are present or absent in the document, and thecurrent document is classified as a legal brief. As a result ofclassifying the document as a legal brief, the text may further beanalyzed in order to extract the case citations present in the legalbrief and to identify the jurisdiction under which the present case isbeing argued. These aspects (jurisdiction and case citations) may thenbe added to the context model separately from the original text. Inaddition, other aspects, such the user's role, may be added to thecontext model. For example, if the document is a legal brief, and theuser is writing the document, the role of legal brief drafter may causethe system to determine that legal opinions from the same judge arerelevant. However, if the user is just reading the legal brief, the roleof legal brief reviewer may cause the system to determine that otherlegal documents with similar content are relevant.

The context model may also include the text of the user's document, itsclassification as a legal brief, and words and phrases that describe thekey themes in the dispute. The words and phrases that describe the keythemes in the dispute may be based on the text of the original document.For example, words and phrases may be assigned weights and those wordswith the highest weights may be included in the list of words andphrases that describe the key themes of the dispute. In determining thewords that describe the main themes, those terms that occur morefrequently may be assigned a higher weight. It is preferable that thoseterms that occur in more important sections of the document be assigneda higher weight. For example, words that occur in the summary ofargument section of a legal brief could be assigned a higher weight thanwords that only occur in the table of citations section. Morespecifically, each time a word occurs in a document, one could be addedto its weight, whereas if a word occurs in an important section, anumber W would be added to its weight, where W is a tunable parametergreater than one. If a word appears in bold or all caps and in animportant section a number X could be added to its overall weight, whereX is a tunable parameter greater than W.

Furthermore, a list of words could be excluded from appearing in thepresent list of words that describe the key themes of the document. Thislist of words to exclude may be selected based on the type of documentbeing viewed. For example, words like “a, an, the, but, or” may beexcluded from all documents, whereas words like “jury, testified,defense, court, evidence, trial, alleged,” or names of those party tothe case, may be excluded specifically from legal briefs. The termsexcluded may be based in part on other aspects of the context model. Thewords or phrases excluded may also be added to the context model, forthe purposes of later using them to assess, filter, rank and organizesearch results. The words or phrases with the biggest N weights couldthen be collected and assigned to the major themes of the document.

Moreover, the context model may contain more than one list of words andphrases, each list representing separate aspects of the text of theoverall document. For example, in a legal brief, one list of words andphrases could represent the statement of jurisdiction. Another list ofwords and phrases could represent the statement of the case, etc. Inanother example, if the user is browsing the web and viewing a web pageon a news site on which more than one news story is presented, thecontext representation may include a list of words and phrasesdescribing each story presented. Each list of words and phrasesassociated with each aspect may be computed using methods describedherein. The beginning and end of each news article may be determinedfirst by determining the web page is being served by a news site andsecond by looking for features that occur between articles, such asarticle titles and hyperlinks to the full article. The context model maybe represented as a list or collection of aspects. In general, oneaspect of the context model may be based on one or more other aspects ofthe context model.

As part of generating the context model, the context generation module224 makes a determination if a search is likely to return usefulresults. For example, the user may be viewing the front page of anelectronic newspaper covering multiple unrelated topics. By analyzingapplication and/or genre-specific document features, such assegmentation (e.g., columns in MS Word, frames in HTML, etc.), and/orother properties of the user's context, the query generation module 226or 312 may determine that a search is unlikely to return useful results,or that certain sources of information may be more likely to containuseful information than other sources of information.

For example, if the user's document contains fewer than N words, where Nis a tunable parameter, the query generation module 226 or 312 maydetermine that a search is not likely to return useful or interestingresults. In another example, the average length of a paragraph of textis computed. If the average length of a paragraph of text is below atunable parameter L, then the query generation module 226 or 312 maydetermine a search is not likely to return useful or interestingresults. The tunable parameter L may be related to other aspects of theuser context, such as the type of document or application beingaccessed. For example, if the user is accessing a contact record inMicrosoft Outlook, it is preferable that the paragraph lengthrequirement not apply because contact records are typically very short.If the user is accessing a PowerPoint presentation, it is preferablethat the paragraph length requirement be shortened, as PowerPointpresentations typically contain short paragraphs, less than a sentencelong. If the user is writing a document in Microsoft Word, however, itis preferable to work with full paragraphs of text.

In cases where a user is browsing the web, query generation module 226or 312 scans the user's document and counts the number of words inhyperlinks and the number of words not in hyperlinks. If the ratio ofhyperlinked to non-hyperlinked words is below a tunable threshold T,then the query generation module 226 or 312 may determine that a searchis not likely to return useful or interesting results. In anotherexample, the user may indicate an area of interest, and that area ofinterest may be represented by terms. If words occurring in areas ofinterest are not present in the user context, the query generationmodule 226 or 312 may determine that a search is not likely to yieldinteresting results. Similarly, areas of disinterest may be representedas lists of terms. If words on that list appear in the user context, thequery generation module 226 or 312 may determine that a search is notlikely to yield interesting results.

In addition, the query generation module 226 or 312 may analyze the textof the current user document to measure a degree of term overlap. Forexample, the user's document may be broken up into sections of length Wwords long. For example by starting at word one, and storing until wordW, and then starting at word W/O and then storing until word W+W/O,where W is a tunable parameter and O is a tunable parameter. If acertain threshold degree of commonality exists between document segments(e.g., all of the document segments are relevant to the Olympics), thequery generation module 226 or 312 may determine that a search is likelyto return useful results. For example if a term occurs in both segmentone and in segment two then one may be added to the overlap score ofsegment one and segment two. If the overlap score of two segments isgreater than a threshold that is some function of the length of the textwindow W, then the two text segments may be called coherent. If acertain portion of subsequent text segments have coherence, then thedocument as a whole may be called coherent and therefore a search may beallowed to proceed. Otherwise, the query generation module 226 or 312may determine that a search is not likely to return useful results. Inaddition, if a certain threshold for density of links on a page isexceeded, the query generation module 312 may determine that the pagedoes not have sufficiently rich content to search. The query generationmodule 312 may also determine the results of a search may be irrelevantor unnecessary to a user based on the broader, non-document specificcomponents of the context model such as current task, user role, etc.

The query generation module 312 may also use non-document specificcontext information to override and initiate searching related to adocument that does not otherwise meet certain searching criteria (e.g.,link density, term overlap). For example, if the user is browsing theweb and the document they are viewing is being transmitted over a securechannel, the URL or location of the document may start with the string“https:”. It may not be desirable to search automatically based on suchpages, because the data within them is often sensitive.

Through such methods, the system prevents retrieval and presentation tothe user of information based on user contexts and documents for whichthe returned items are likely to be ranked low or otherwise proveirrelevant. Conversely, such methods ensure the most relevantinformation is sought from the information sources 108 most likely toproduce it. Preferably, the user would be given an option to overridethis determination. For example, if the user selects one documentsegment over others, the query generation module 226 or 312 could focusits analysis on that segment.

If the query generation module 226 or 312 determines that a search islikely to return useful results, the query generation module 226 or 312forms multiple search queries targeted to multiple information sources108 (block 406). For example, one information source 108 may allowBoolean operators, and another information source 108 may not allowBoolean operators. Similarly, one information source 108 may allow up tofour search terms, and another information source 108 may only allow twosearch terms. An information source 108 that allows four search termspreferably receives a query including the top four terms in the contextmodel (e.g., dog, cat, mouse, and book), and an information source 108that only allows two search terms preferably receives a query includingthe top two terms in the context model (e.g., dog, and cat). Anysuitable method of selecting information sources 108 and generatingqueries may be used. A detailed discussion of various methods ofgenerating queries is included in U.S. Patent Publication 2005/0028156,the entirety of which is incorporated herein by reference.

One or more query messages 408 are then sent to one or more informationsources 108. In response to receiving a query message 408, eachinformation source 108 searches one or more information repositories andgenerates a set of search results (block 410). For example, a searchengine such as Google™ may generate a plurality of summaries, whereineach summary includes a portion of an associated document (e.g., a webpage). Typically, these summaries are intended to provide a human userwith some notion of the associated document's contents so that the usermay assess the relevance of the document to the user's needs. Eachinformation source's search results are then transmitted to the searchsystem 114 and/or the client device 102 in a separate search resultsmessage 412.

Other information systems 108 may provide other data about the searchresults, such as the subject area, industry classification, date ofpublication, or author. This data may also be used as a feature of theresult model, included in the search results message 412. In addition,the result model may include the query used to generate those results asa feature, included in the search results message 412. For example, ifthe query created by the query generation module 226 or 312 is directedat a news database, the result modeling module may treat news items withpreference, depending on the original context model. Many informationsources allow users to enter additional constraints that significantlychange the character, subject area, or other properties of the searchresults retrieved. For example, users may be able to specify the type ofitem retrieved, e.g., news, patents, journal articles, or WWW homepages. Other properties, such as the date the document was published, orthe WWW location in which the document was published are also oftenavailable as constraints on the search query to information sources.These constraints allow the query generation module 226 to specify at asuitable level of specificity what information should be retrieved. Thequery generation module 226 may only generate queries directed atcertain information sources or with certain constraints in response tocertain properties of the user context. Any property of the informationrepository being searched, the search results as a group, or individualresults on their own, may be used for the purposes of assessing,ranking, organizing, and presenting search results.

The result modeling module 228 or 314 uses the search result messages412 to create search result models and compares the search result modelsto the original context model (block 414). The search result models maybe compared to the original context model using any suitable scoringand/or comparison algorithm. For example, the client device 102 or thesearch system 114 may generate a score for each search result model bymultiplying the weights of terms that are common to both the searchresult model and the original context model and then summing thoseproducts.

A search result model is a representation of a search result from aninformation source 108. Each search result model is a statistical andheuristic model of the search result that may include lexical (words orphrases) or symbolic (logical) data. For example, a summary from a newsarticle including occurrences of the words dog and cat might bedescribed by a search result model as “dog:4; cat:3; IsNewsArticle”where the numbers represent weights associated with the words, andIsNewsArticle indicates the type of document. In this example, thesearch result model indicates that the associated document is more aboutdogs than it is about cats. The weights may be assigned by any suitablealgorithm. For example, the weighting algorithm may take into accountthe number of occurrences of each word, the location of each word (e.g.,in the title of the summary versus in the body), the style of the words(e.g., bold text versus plain text), etc. In addition, the resultmodeling module 228 or 314 may use an information source-specific stoplist when constructing the search result model in order to prevent theinclusion of certain terms. For example, “court” may be on the stop listfor Lexis® but not on the stop list for Google™.

A search result model may include a summary of a search result returnedby the information source 108 in response to a query, and/or the searchresult model may be derived from that summary through statistical andheuristic methods. The summaries returned by information sources 108,whether written by humans or automatically generated, are generallyintended to enable human users to assess the relevance of the searchresults. Thus, these summaries are not necessarily optimal as, or forconstructing, search result models for the purpose discussed here (i.e.,for comparison with a context model to assess relevance of the searchresult to the user's current context). In certain embodiments, theinformation source 108 may return a fuller or more representativesummary of the search result derived statistically and/or heuristically,specifically for the purpose of enabling ranking, organizing, and/orpresenting information, as described here. More generally, theinformation source 108 may return meta-data about the search resultand/or properties of the information source 108 itself. This meta-datamay or may not be specifically designed for the purpose of enablingranking, organizing, and/or presenting information.

In certain embodiments, an information source 108 may return the entiredocument associated with each search result, rich meta-data associatedwith each search result, or a model of each such document (as opposed toa summary of each document) that may include lexical and symbolicrepresentations. For example, the search result model for a resultreturned by the information source may contain a list of words occurringin the document along with the frequency with which each word occurs inthat document. The information source 108 may also return dataconcerning the information source 108 as a whole. For example,statistical information about the entire set of documents, such as thenumber of documents in which a term occurs, or other data elements thedocument contains. The search result model returned by the informationsource for a result may be based in part on this statisticalinformation. For example, the weights associated with the terms in alist comprising an aspect of such a model may be modulated by thisinformation. The search result model may also be based in part on a stoplist to exclude certain terms from inclusion.

This search result model may also take into account the location of aterm or terms in the document. For example, a term which is located in aheading in the document may be weighted more highly in the list of termscomprising an aspect of the search result model returned by theinformation source. The model may also take into account stylisticaspects of the document. For example, a word which is in bold face, orin a larger font size than the rest of the document, may be weightedmore highly in the list of terms. Conversely, a term which is in asmaller type font may have its weight reduced. The search result modelmay also take into account the order of terms in the document. Forexample, if two terms occur together in a given order, this order may bereflected in the search result model as well.

The search result model may also be based on the genre or type of thedocument. Examples of this include an archived email, a resume, a patentapplication, a legal brief, etc. The genre or type information may beused, for example, to determine a specialized stop list of terms to beexcluded from the model. In addition, the genre or type information maybe used to identify key terms of particular interest or to alter theweighting of terms in the model. For example, the terms following thelabel “Subject” in an archived email might be weighted more highly thanother terms. Similarly, the result model may be based on the applicationused to create the document.

Furthermore, the search result model may contain aspects of the usercontext in which the document was originally produced, such as the taskthat resulted in that document. In one example, the present system maysubmit the context model to a search engine along with the user'sdocument, when the document is being saved. The search engine could thenreturn the stored context model along with the search results.Alternatively, the stored context model may be incorporated into thesearch result model returned for that document, or the search resultmodel may be based on this stored context model. For example, otheraspects of the search result model, such as the weights of terms, may bechanged on the basis of this context model.

In this manner, improved assessment, ranking, organization andpresentation may be performed based on a more detailed and accuratesearch result model. Alternatively, summary style search resultstypically include a pointer to the full document associated with thesummary, which may be used to retrieve the full document. For example,most Internet search engines return a hyperlink to the associated webpage. In any case, the client device 102 or the search system 114 mayuse a search result model created from some or all of the full document(as opposed to just the search result summary), in addition to otherdata about each search result. For example, the hyperlink itself maycontain additional data that is helpful for ranking, organizing, orpresenting search results for a given context model. In one example,search results are organized by the internet domain under which eachsearch result occurs.

The assessment, organization, and ranking module 230 or 316 uses thecomparison of the search result models to the original context model toassess, organize and rank the search results (block 416). In oneexample, the assessment, organization, and ranking module 230 or 316compares all of the terms occurring in a search result model to termsoccurring in the user's document. Consider a user working on a documentabout ecology and global warming. The context model might include termslike “ecology:5; ‘global warming’:10; emissions:9; co2:5; ‘greenhousegas’:4.” Further consider a search engine that only accepts one term.Given the context model above, the search term “global warming” might beselected as a query. The result of executing a search based on that termmay result in several search results, with search result models asfollows. Search Result 1:“‘global warming’:2; developing:1; country:1;china:1.” Search Result 2: “‘global warming’:2; ‘greenhouse gas’:2”.According to the search engine, Search Result 1 is more relevant thanSearch Result 2. But the search engine does not have all of theinformation included in the context model. Therefore, the assessment,organization, and ranking module 230 or 316, may further compare thecontext model with these search results models to arrive at a score, forexample, by multiplying the weights of terms occurring in the searchresult model with the weights of terms occurring in the context modeland dividing by the number of unique terms in the search result model.If a term is not present in the context model, it may be given a weightof zero. In the example above, then, Search Result 1 would be given ascore of 5, whereas Search Result 2 would be given a score of 14. Theassessment, organization, and ranking module 230 or 316 may therebydetermine Search Result 2 is more relevant to the user context thanSearch Result 1 and thereby rank Search Result 2 ahead of Search Result1, even though the search engine originally ranked them in the oppositeorder. Similarly, when querying multiple search engines, the presentmethod may be applied to search results from all search engines queriedin order to rank search results into a single ordered list.

In addition to ranking and organizing, in some embodiments, theassessment, organization, and ranking module 230 or 316 may simplyeliminate certain search results rather than presenting them to theuser. Search engines sometimes return irrelevant results. This may bebecause the search engine lacks information about the user's context.The present example may eliminate search results with a weight of zero,allowing the system to only present search results that have at leastone word in common with the user's context. Furthermore, search resultsthat rank below a certain threshold, either absolute or relative toother results, may be eliminated.

For example, in one embodiment, the system is connected to one or moresearch engines such as one or more World-Wide Web (WWW) search engines.In a WWW search, there are typically no editorial controls on whatinformation is contributed to the databases. As a result, these searchengines may contain “junk” data. In some cases, data may be specificallygenerated by a malicious publisher to “game” the search engine so as togain more referral traffic from the search engine, while providing novaluable information to the user. In the industry, this is called“search SPAM.” In order to avoid presenting this irrelevant informationto the user, the present system compares the query model with the searchresult model in order to determine the longest uninterrupted sequence ofsearch terms occurring in the search result model returned by the searchengine that occur in the same order as in the search query thatoriginally generated that search result. In other words, the presentsystem computes the longest matching subsequence of the search querythat appears in the search result model. Search results that contain asequence of search terms of length greater than or equal to a tunableparameter T in their descriptions, are considered search SPAM andpreferably eliminated from the search results presented to the user.SPAM removal can be turned on and off on a per-information source basisso as to avoid false positives.

Furthermore, some of the search engines the system is connected to mayprovide search results that contain none of the words mentioned in thesearch query, so as to provide a list of search results to the user evenwhen there are no exact matches. Typically, these documents areirrelevant. Therefore, the present system is preferably configured toeliminate search results for which there are no terms in common betweenthe search result model and the context model.

The system may be connected to multiple WWW search engines, in additionto other databases that contain content that is less broadly applicable(e.g., Lexis-Nexis). Given the system is connected with so manydifferent sources, many duplicate search results may be retrieved.Near-duplicates may be eliminated using methods described in USPTOPublication 2005/0028156. However, the resulting list may still containsimilarities, and especially in light of the methods of ranking searchresults described herein, provide the user with too many documents thatare related to the user's context in the same way. Therefore, if asearch result is related to the user's context in the same or similarway as another search result, one of the two search results ispreferably eliminated before the search results are presented to theuser. This provides a more interesting list of search results. Morespecifically, given a context model C, consisting of terms C1, C2, . . .CN, a search result model R, consisting of terms R1, R2, . . . , RN, andanother search result model R′, consisting of terms R′1, R′2, . . . ,R′N, then let I(C,R) be the intersection of C and R, and I(C,R′) be theintersection of C and R′. If the size of the intersection of I(C,R) andI(C,R′) is greater than a tunable parameter T, then R′ is eliminatedbefore the search results are presented to the user. In addition, termstems, multiple-word phrases, or any aspect of the context model orsearch result model may be substituted for terms; the function I may besubstituted for any method of computing how a search result relates to acontext; and the size of intersection may be replaced with a weightedcomparison metric that may not be transitive (e.g., dot product, cosine,etc.) or any other suitable method for comparing relatedness.

The system might also use the results of comparing the search resultmodels with the context model to organize the search results in someappropriate way, for example, by segmenting them based on categoriesthat are selected based on the user's current task, or properties of thesearch results themselves. The ranked and organized search results are acombination of the search results from multiple information sources 108in an order that is not necessarily the same as the order of theindividual search results received from the information sources 108. Forexample, one information source 108 may return a summary of documents A,B, and C ranked in that order, and another information source 108 mayreturn a summary of documents C, D, and F ranked in that order. However,the assessment, organization, and ranking module 230 or 316 may rank thecombined results as B, C, D, A, F.

The assessment, organization, and ranking module 230 or 316 uses thecomparison of the search result models to the original context model toorganize and present the search results (block 416). In other words,relevancy is a matter of degree (position in a list) and/or type (whichgroup to be included in). An example of search results organized intodifferent categories is illustrated in FIG. 6 (e.g., Top Results, Web,News, Blogs, Shopping, Desktop, etc.).

The original context model is used by the assessment, organization, andranking module 230 or 316 to determine which organization scheme to useand which presentation strategy to pursue. When the user's contextchanges, the organization scheme and presentation strategy may alsochange to best support and reflect the user's current task, propertiesof the user's document, document genre, application, etc.

In one embodiment, search results are organized by the assessment,organization, and ranking module 230 or 316 based on combinations ofrules activated by the original context model that combine a pluralityof features of the search result model to produce a categorized list.Similar rules may be used to select a presentation strategy, e.g., a popup display, banner, tickertape, embedded links in the user's activedocument, etc.

For example, when a user is composing an email message, informationassociated with the email recipient may be placed next to the emailrecipient, and information associated with the body of the email may beplaced next to the body, whereas additional information on the topicsdiscussed in each news article on a web site may be presented when theuser moves her mouse over the text in the article. When the user isaccessing an email application and composing an email in thatapplication, the above presentation scheme may be selected by comparingaspects of the user context with a list of rules. For example, whilewriting an email, the user context could include representations of theapplication name, the application type, the active task, stage in thetask, sender and recipient name, the location of the recipient in screencoordinates, and body of the email. More specifically, the context modelmight include: “ApplicationName=‘Microsoft Outlook’;ApplicationType=‘Email’; Task=‘ComposeEmail’; Stage=‘beginning’;Sender=‘John Doe’; Recipient=‘Jane Doe’; RecipientLocation=‘10,10’Body=‘Hi Jane,’”. The system may further include rules of the form ofantecedent consequent pairs, where antecedents include features of thecontext model and consequents include features of result models (so asubset of the results may be selected) and instructions on how todisplay the results. For example, consider a set of search resultsgathered from multiple search engines—WWW search engines, desktop searchengines that contain email and files, and other databases, based on anemail a user is composing. In order to display information about anemail recipient next to the recipient's name, a rule may be expressed asfollows to select search results that are email messages sent by therecipient and display those search results next to the location of therecipient on the screen: “IF Task=‘ComposeEmail’ AndApplicationName=‘Microsoft Outlook’ THEN SELECT DocumentType=‘Email’EmailSender=%Recipient% DISPLAY AT %RecipientLocation%”. By bindingvariables previously accessible in the user context model with variablesin the rule, the rule could then be rewritten as follows: “IFTask=‘ComposeEmail’ And ApplicationName=‘Microsoft Outlook’ THEN SELECTDocumentType=‘Email’=‘Jane Doe’ DISPLAY AT ‘10,10’. A number of otherorganization schemes may be activated based on different user contextsby listing similar rules.

Search results may be categorized based on any attribute of the searchresult model, including the query that generated the search results. Forexample, the original context model may have specified the user as alawyer, the user is viewing a contact in Microsoft Outlook and that thesearch results should be grouped by type, including, for example, newsstories about the contact person's company, the home page of the contactperson, email recently exchanged with that contact person, and anyrecent litigation filed by the contact person's company, among others.The query generation module 226 or 312 may then respond by dispatchingseveral queries. For example, (1) a query to Lexis®, specifying thatonly news articles should be retrieved and specifying the contactperson's company name, (2) another query to Yahoo! News® specifying thecontact person's company name, (3) another to MSN® with the name andcompany name of the contact person, (4) another with the contactperson's name to desktop search software, specifying recent email, and(5) yet another to Lexis®, specifying that only litigation in which thecontact person's company name is named should be retrieved. Theassessment, organization, and ranking module 230 or 316 could then groupsearch results from queries (1) and (2) within a category labeledCompany News, items from query (3) under Home Pages, search results fromquery (4) within a category labeled Recent Email, search results fromquery (5) under a category labeled Litigation, and so on.

Properties of each individual search result may be used in a similar wayby the assessment, organization, and ranking module 230 or 316 toorganize the search results. For example, the date the documentcorresponding to a given search result was published may be used toorganize search results into categories such as today, last week, lastmonth, last year, etc. by comparing the current date with the dateassociated with each search result. Similarly, the file format of thedocument, its subject area, words present or absent in the documentsummary or abstract, the content source, etc., may be used by theassessment, organization, and ranking module 230 or 316 to organize thesearch results.

To provide more organized and/or more complete information to the user,the assessment, organization, and ranking module 230 or 316 maydetermine that, given a user context, additional information is requiredin order to evaluate the quality and/or character of the searchresults,. A single information source may not provide completeinformation about a retrieved item. For example, an internet searchengine may provide a URL, Title, and Summary of a web site, but a socialbookmarking site like del.icio.us may provide a user ranking for a givenweb site, along with comments about that web site, which could be usefulin assessing, ranking and/or organizing search results provided by theinternet search engine. The assessment, organization, and ranking module230 or 316 may issue a number of additional queries in order to gatheradditional information based on an initial retrieval. For example, byretrieving user ratings, reviews and tags or categories for a web siteretrieved in a first step, an original search result model may beenhanced. The assessment, organization, and ranking module 230 or 316could then use this enhanced search result model to assess, rank, and/ororganize search results by comparing the enhanced search result modelwith the original context model. For example, search results may beordered by a combination of keyword overlap and user rating, and/orsearch results may be organized into categories labeled by tags (e.g.,del.icio.us tags) users have given them. In addition, chains ofarbitrary length associated with an arbitrary number of informationsources may be constructed in order to further enhance the search resultmodel for the purpose of assessing, ranking, and/or organizing searchresults with respect to a given context model.

In some contexts, information retrieved from one source can be combinedwith elements of the context model to provide input into another source.For example, while the user is shopping online, a product name may beextracted from the page in which the user is shopping. That product namemay then be added to the context model. More specifically, say the useris viewing a page on a shopping site for a wireless mouse made byMicrosoft. Given the user is visiting a shopping site, the system couldinfer that the user is shopping. Given the structure of the shoppingsite, the system could extract the product name and manufacturer fromthe web page, in addition to key words and phrases. Thus the contextmodel may include “UserIsShopping; ProductName=‘Wireless Mouse’;ProductVendor=‘Microsoft’; Price=‘$29.99’” in addition to otherimportant words and phrases like “PC, Windows, silver, optical”.

Since the user is shopping, it may be desirable to retrieve informationabout similar products from other vendors. This product name may readilybe used to look up vendors of similar products from a product database.The key words in the context model can be used to filter, sort, and/ororganize the results of that search. Furthermore, in the context ofshopping, and given this list of similar products from other vendors, itmay be desirable to look up the price of those products or try and findimages of those products to present to the user. Thus, the search systemmay direct a plurality of queries to a plurality of additionalinformation sources in order to retrieve price information and an imageof the product.

This information may then be further combined with the results ofprevious queries in order to form a more detailed search result model.Given the context model generated in previous steps, the search resultmodel may then be further evaluated and organized so that the bestinformation is presented to the user in a way that makes the most sensein the given context. For example, the system may organize the resultsof the above set of assessment and retrieval steps into a category ofitems presented in a user interface labeled “Similar Products” thatincludes a list of other wireless mice in the price range of $10-50,whose descriptions include at least two of the words and phrasesoriginally present in the context model, and listed in order of mostoverlapping to least overlapping description. Other categories ofinformation may be retrieved such as “Professional Reviews” or “UserComments” generated through a similar process of combining elements ofthe context model with elements of a first retrieval step, in order toformulate a second or third retrieval step, which is further evaluated,assessed and/or organized in light of the original context model.

It will be further appreciated that the original context model mayitself be augmented as the result of a retrieval and assessment step.For example, if the only discernable property from a shopping site is aUPC code, the product name and manufacturer may be accessible from aproduct database. This information may then be incorporated into thecontext model and the above retrieval process may then be initiated.Thus, the search results model may be combined with the original contextmodel to form a modified context model, which can then be subject tofurther retrieval, assessment, ranking and/or organization. This processof chaining sources allows the system to provide better search resultsto the user. The search results are organized, ranked, and/or assessedusing more of the available information, even if that information mustbe retrieved from multiple information sources.

The assessment and ranking method outlined above may or may not beapplied within a search result category, at the user's specification,based on properties of the user context, or properties of the searchresult model, etc. A similar method may be used to determine thepresentation strategy for the search results. By combining sets of rulesthat operate on the properties of the search results model, the contextmodel can flexibly specify how search results should be organized andpresented by the assessment, organization, and ranking module 230 or316.

The assessment, organization, and ranking module 230 or 316 thengenerates a ranked and organized search results message 418 specifyingpreferred presentation strategy. The ranked and organized search resultsmay then be viewed by the user of the client device 102 at the same timethe user views the document associated with the search results (block420). For example, the ranked search results may be viewed in a side barto the document being displayed by the client device 102 (see FIG. 6).In other embodiments, the assessment and ranking module may produceother instructions about how the results should be displayed, forexample, by organizing the search results into categories, or byspecifying the most appropriate user interface modality, for example, byembedding links into the user's active document, given the currentcontext model.

A flowchart of an example process 500 for obtaining and assessing andranking search results is illustrated in FIG. 5. Preferably, the process500 is embodied in one or more software programs which is stored in oneor more memories and executed by one or more processors. For example,the process 500 may be software running on a client device 102 and/orthe context based search system 114. Although the process 500 isdescribed with reference to the flowchart illustrated in FIG. 5, it willbe appreciated that many other methods of performing the acts associatedwith the process 500 may be used. For example, the order of many of thesteps may be changed, and some of the steps described may be optional.

Generally, the process 500 analyzes a user's current context, includingin particular a document being viewed by the user at a client device 102to automatically form multiple search queries associated with thatdocument. The queries are sent to multiple information sources 108,which respond with different search results. Models of the searchresults are then compared to a context model as described above tocreate a ranked and organized list of the search results for display tothe user.

More specifically, the process 500 is typically triggered each time adocument being viewed at a client device 102 changes (block 502). Forexample, the user may click a hyperlink in a web page, thereby changingthe content of a browser window, or the user may simply change focus(e.g., where the cursor is placed) within the same document.

When the document changes, the client device 102 or the search system114 analyze the document as well as other aspects of the user's contextas described above to create a context model (block 504). As describedabove, the context model is a statistical and heuristic model of theuser's context. For example, if the user is viewing a text document thatincludes occurrences of the words dog, cat, mouse, and book, the contextmodel might be “dog:10; cat:6; mouse:3; book:1” where the numbersrepresent weights associated with the words. Again, the weightingalgorithm may take into account the number of occurrences of each word,the location of each word (e.g., in the title of the document versus inthe body), the style of the words (e.g., bold text versus plain text),properties of the user's task, the active application, the user's rolein an organization, etc., as described earlier. It will be appreciatedthat any suitable method of generating context models may be used.

Based on the context model, the client device 102 or the search system114 forms multiple queries targeted to multiple information sources 108(block 506). As described above, different information sources may havedifferent limitations placed on how queries may be formed. Accordingly,the search system 114 customizes each query for each information source108. For example, an information source 108 that only allows two searchterms may receive the query “dog OR cat”. The client device 102 or thesearch system 114 then sends the queries to the respective informationsources (block 508). For example, the client device 102 or the searchsystem 114 may send one query to Google™ over the Internet and anotherquery to a proprietary database over a local intranet.

In response, each information source 108 searches one or more databasesand generates a set of search results, which are received by the clientdevice 102 or the search system 114 (block 510). For example, one ormore information sources 108 may return the search result summariesshown in block 510 of FIG. 5. In these example search results, theexample search terms (i.e., dog and cat) appear in the search resulttitles and the search result bodies. In addition, other words containedin the example context model (i.e., mouse and book) appear in one of theexample search results even though those words were not included in thisexample search query.

The client device 102 or the search system 114 then creates a model ofeach search result (block 512). For example, “dog:4; cat:3” may modelthe first example search result in block 512, and “dog:3; cat:3;mouse:3; book:4” may model the second example search result in block512. In these examples, the modeling algorithm counted occurrences of aterm in the title of a search result as having a weight of two andoccurrences of a term in the body of a search result as having a weightof one. For example, the first example search result in block 512includes one occurrence of “dog” in the title (counted as a weight oftwo) and two occurrences of “dog” in the body (counted as a weight ofone each) for a total weight of four. It will be appreciated that anysuitable method of modeling search results may be used.

The client device 102 or the search system 114 then compares the searchresult models to the original context model (block 514) using anysuitable scoring algorithm and ranks the search results based on thesescores. In addition, the client device 102 or the search system 114 mayeliminate certain search results, organize certain search results intocategories or folders, or, in general, determine how the search resultsshould be best presented to the user in light of the original contextmodel. The ranked and organized search results are then displayed to theuser (block 516).

An example screen shot 600 of ranked search results 602 being displayedin a side bar 604 to a document 606 by a client device 102 isillustrated in FIG. 6. In this example, the document 606 is apresentation slide about increasing sales of energy drinks. Accordingly,the client device 102 or the search system 114 assigned a high score tosearch results associated with energy drink growth (i.e., ranked towardthe top of the combined search results).

An example screen shot 700 of a search results web page from oneinformation source is illustrated in FIG. 7. An example screen shot 800of a search results web page from another information source isillustrated in FIG. 8. An example screen shot of a search results sidebar in accordance with an embodiment of the present system isillustrated in FIG. 9. In these examples, certain search results 702 and704 are located both of the prior art search results 700 and/or 800 andare also included in the combined search results 900. Other searchresults 802 and 804 are only located in one of the prior art searchresults 800 and are also included in the combined search results 900.Still other search results 902-908 in the combined search results 900may not be in either of the prior art search results 700 and 800. Asshown, the combined search results 900 may be in any order (i.e., notnecessarily the same order as one or more of the prior art systems 700and/or 800).

In summary, persons of ordinary skill in the art will readily appreciatethat methods and apparatus for assessing, ranking, organizing andpresenting search results have been provided. The foregoing descriptionhas been presented for the purposes of illustration and description. Itis not intended to be exhaustive or to limit the invention to theexemplary embodiments disclosed. Many modifications and variations arepossible in light of the above teachings. It is intended that the scopeof the invention not be limited by this detailed description ofexamples.

1. A method of displaying search results, the method comprising:determining if a length associated with a document being accessed by auser exceeds a first threshold; determining if a ratio ofnon-hyperlinked words to hyperlinked words in the document exceeds asecond threshold; determining if a similarity score associated with atleast two different segments of the document exceeds a third threshold;sending a query to a search engine if (a) the length associated with thedocument exceeds the first threshold, (b) the ratio of non-hyperlinkedwords to hyperlinked words in the document exceeds the second threshold,and (c) the similarity score associated with at least two differentsegments of the document exceeds the third threshold; receiving aplurality of search results from the search engine; and generating adisplay indicative of the plurality of search results.
 2. The method ofclaim 1, wherein the first threshold is based on a genre associated withthe document being accessed by the user.
 3. The method of claim 1,wherein the second threshold is based on a genre associated with thedocument being accessed by the user.
 4. The method of claim 1, whereinthe third threshold is based on a genre associated with the documentbeing accessed by the user.
 5. The method of claim 1, includingdetermining if at least one of an area of interest and an area ofdisinterest are associated with the document being accessed by the user.6. The method of claim 1, including determining non-document specificcontext information.
 7. The method of claim 1, wherein the search queryis based on a first aspect of a user context, the first aspect of theuser context including data indicative of text being accessed by a user,the query being different than the user context.
 8. The method of claim7, wherein the first aspect of the user context includes data indicativeof the at least one task.
 9. The method of claim 7, including comparingdata indicative of the plurality of search results to data indicative ofa second aspect of the user context to determine a plurality ofrelevance scores associated with the plurality of search results, thesecond aspect of the user context including data indicative of at leastone task in which the user is engaged out of a plurality of possibleuser tasks.
 10. The method of claim 9, wherein the second aspect of theuser context is based on at least five of (a) a location of the at leastone predetermined word in the text being accessed by the user, (b) astyle of the at least one predetermined word in the text being accessedby the user, (c) a presence of at least one specified word in the textbeing accessed by the user, (d) an absence of the at least one specifiedword in the text being accessed by the user, (e) metadata attributes ofat least a portion of the text being accessed by the user, (f) a fieldpresented by a computer application, (g) an attribute of informationbeing presented in the computer application, (h) an element of thecomputer application visible to the user, (i) a document genre, (j) adocument type, (k) a type associated with the computer application, (l)a method by which the user is accessing the computer application, (m) arole in an organization, (n) a type of the organization, (o) a propertyof the organization, (p) a stage in a task, (q) a stage in a workflow,(r) a type of task being supported by the computer application, (s) astage in a task being executed by the computer application, (t) apervious user behavior, (u) a topical area of interest, (v) a proportionof hyperlinked text to non-hyperlinked text, and (w) an average sentencelength in the text being accessed by the user.
 11. The method of claim9, wherein the second aspect of the user context is based on (a) a styleof the at least one predetermined word in the text being accessed by theuser and (b) a type associated with a computer application.
 12. Themethod of claim 9, including comparing data indicative of the pluralityof search results to data indicative of at least one of the first aspectof the user context and the second aspect of the user context todetermine a plurality of organization schemes, the plurality oforganization schemes grouping at least a portion of the plurality ofsearch results into at least two genres.
 13. The method of claim 12,wherein generating the display indicative of the plurality of searchresults includes generating the display to be indicative of theplurality of organization schemes.
 14. The method of claim 9, includingreceiving a plurality of result models from the search engine, theplurality of result models including a plurality of terms associatedwith the plurality of search results and a plurality weights associatedwith the plurality of terms.
 15. The method of claim 14, includingcomparing data indicative of the plurality of result models to dataindicative of a user context to determine a plurality of scoresassociated with the plurality of search results.
 16. The method of claim1, including determining a second search query from a user context andan interim search result.
 17. An apparatus for characterizing a searchresult as potential spam, the apparatus comprising: a processor; amemory device operatively coupled to the processor; and a network deviceoperatively coupled to the processor; wherein the memory device stores asoftware program to cause the processor to: determine if a lengthassociated with a document being accessed by a user exceeds a firstthreshold; determine if a ratio of non-hyperlinked words to hyperlinkedwords in the document exceeds a second threshold; determine if asimilarity score associated with at least two different segments of thedocument exceeds a third threshold; send a query to a search engine if(a) the length associated with the document exceeds the first threshold,(b) the ratio of non-hyperlinked words to hyperlinked words in thedocument exceeds the second threshold, and (c) the similarity scoreassociated with at least two different segments of the document exceedsthe third threshold; receive a plurality of search results from thesearch engine; and generate a display indicative of the plurality ofsearch results.
 18. The apparatus of claim 17, wherein the search queryis based on a first aspect of a user context, the first aspect of theuser context including data indicative of text being accessed by a user,the query being different than the user context.
 19. The apparatus ofclaim 18, wherein the software program is structured to cause theprocessor to compare data indicative of a plurality of search results todata indicative of a second aspect of the user context to determine aplurality of relevance scores associated with the plurality of searchresults, the second aspect of the user context including data indicativeof at least one task in which the user is engaged out of a plurality ofpossible user tasks.
 20. The method of claim 19, wherein the secondaspect of the user context is based on at least five of (a) a locationof the at least one predetermined word in the text being accessed by theuser, (b) a style of the at least one predetermined word in the textbeing accessed by the user, (c) a presence of at least one specifiedword in the text being accessed by the user, (d) an absence of the atleast one specified word in the text being accessed by the user, (e)metadata attributes of at least a portion of the text being accessed bythe user, (f) a field presented by a computer application, (g) anattribute of information being presented in the computer application,(h) an element of the computer application visible to the user, (i) adocument genre, (j) a document type, (k) a type associated with thecomputer application, (l) a method by which the user is accessing thecomputer application, (m) a role in an organization, (n) a type of theorganization, (o) a property of the organization, (p) a stage in a task,(q) a stage in a workflow, (r) a type of task being supported by thecomputer application, (s) a stage in a task being executed by thecomputer application, (t) a pervious user behavior, (u) a topical areaof interest, (v) a proportion of hyperlinked text to non-hyperlinkedtext, and (w) an average sentence length in the text being accessed bythe user.
 21. The method of claim 19, wherein the second aspect of theuser context is based on (a) a style of the at least one predeterminedword in the text being accessed by the user and (b) a type associatedwith a computer application.
 22. A computer readable medium storing asoftware program to cause a computing device to: determine if a lengthassociated with a document being accessed by a user exceeds a firstthreshold; determine if a ratio of non-hyperlinked words to hyperlinkedwords in the document exceeds a second threshold; determine if asimilarity score associated with at least two different segments of thedocument exceeds a third threshold; send a query to a search engine if(a) the length associated with the document exceeds the first threshold,(b) the ratio of non-hyperlinked words to hyperlinked words in thedocument exceeds the second threshold, and (c) the similarity scoreassociated with at least two different segments of the document exceedsthe third threshold; receive a plurality of search results from thesearch engine; and generate a display indicative of the plurality ofsearch results.
 23. The computer readable medium of claim 22, whereinthe search query is based on a first aspect of a user context, the firstaspect of the user context including data indicative of text beingaccessed by a user, the query being different than the user context. 24.The computer readable medium of claim 23, wherein the software programis structured to cause the computing device to compare data indicativeof the plurality of search results to data indicative of a second aspectof the user context to determine a plurality of relevance scoresassociated with the plurality of search results, the second aspect ofthe user context including data indicative of at least one task in whichthe user is engaged out of a plurality of possible user tasks.
 25. Themethod of claim 24, wherein the second aspect of the user context isbased on at least five of (a) a location of the at least onepredetermined word in the text being accessed by the user, (b) a styleof the at least one predetermined word in the text being accessed by theuser, (c) a presence of at least one specified word in the text beingaccessed by the user, (d) an absence of the at least one specified wordin the text being accessed by the user, (e) metadata attributes of atleast a portion of the text being accessed by the user, (f) a fieldpresented by a computer application, (g) an attribute of informationbeing presented in the computer application, (h) an element of thecomputer application visible to the user, (i) a document genre, (j) adocument type, (k) a type associated with the computer application, (l)a method by which the user is accessing the computer application, (m) arole in an organization, (n) a type of the organization, (o) a propertyof the organization, (p) a stage in a task, (q) a stage in a workflow,(r) a type of task being supported by the computer application, (s) astage in a task being executed by the computer application, (t) apervious user behavior, (u) a topical area of interest, (v) a proportionof hyperlinked text to non-hyperlinked text, and (w) an average sentencelength in the text being accessed by the user.
 26. The method of claim24, wherein the second aspect of the user context is based on (a) astyle of the at least one predetermined word in the text being accessedby the user and (b) a type associated with a computer application.